SMCV: a Methodology for Detecting Transient Faults in Multicore Clusters

نویسندگان

  • Diego Montezanti
  • Fernando Emmanuel Frati
  • Dolores Rexachs
  • Emilio Luque
  • Marcelo R. Naiouf
  • Armando De Giusti
چکیده

The challenge of improving the performance of current processors is achieved by increasing the integration scale. This carries a growing vulnerability to transient faults, which increase their impact on multicore clusters running large scientific parallel applications. The requirement for enhancing the reliability of these systems, coupled with the high cost of rerunning the application from the beginning, create the motivation for having specific software strategies for the target systems. This paper introduces SMCV, which is a fully distributed technique that provides fault detection for message-passing parallel applications, by validating the contents of the messages to be sent, preventing the transmission of errors to other processes and leveraging the intrinsic hardware redundancy of the multicore. SMCV achieves a wide robustness against transient faults with a reduced overhead, and accomplishes a trade-off between moderate detection latency and low additional workload.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

transformer differential protection using the fault-generated high-frequency transient components

Power transformers are the most important components of a power system, so their protection is a critical issue. This paper proposes a novel and efficient algorithm based on the high-frequency components of the differential current signal to discriminate between the magnetizing inrush currents and the internal faults. After detecting the over-current in the differential current signals, samples...

متن کامل

A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints

One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...

متن کامل

Cross Entropy-Based High-Impedance Fault Detection Algorithm for Distribution Networks

The low fault current of high-impedance faults (HIFs) is one of the main challenges for the protection of distribution networks. The inability of conventional overcurrent relays in detecting these faults results in electric arc continuity that it causes the fire hazard and electric shock and poses a serious threat to human life and network equipment. This paper presents ​an HIF detection algori...

متن کامل

A New Method for Duplicate Detection Using Hierarchical Clustering of Records

Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...

متن کامل

Detection of power oscillation and simultaneous faults using Clark transform

Distance relays are widely used to protect transmission lines. Sometimes, in these lines due to the occurrence of the oscillation of the power, the impedance calculated in the distance relay enters into its functional zones and leads to the cutting off of the lines. This issue can cause global power outages. Accordingly, in this paper, a Clark-based method for detecting the oscillation of power...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CLEI Electron. J.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2012